perm filename EVALU[AM,DBL]1 blob
sn#372416 filedate 1978-08-08 generic text, type T, neo UTF8
.NSECP(Evaluating AM)
.BEGIN TURN ON "{⎇"
.QQ
All mathematicians are wrong at times.
-- Maxwell
.ESS
This chapter contains discussions "meta" to AM itself.
First comes an essay about judging the performance of a system like AM.
This is a very hard task, since AM has no "goal". Even using current mathematical
standards, should AM be judged on what it produced, or the quality of the
path which led to those results, or the difference between what it started with
and what it finally derived?
Section {SECNUM⎇.2 then deals with the capabilities and limitations of AM:
.B48; ONCE PREFACE 0;
⊗8# ⊗* What concepts can be elicited from AM now?
With a little tuning/tiny additions?
⊗8# ⊗* What are some notable omissions in AM's behavior? Can the user elicit these?
⊗8# ⊗*
What could probably be done within a couple months of modifications?
⊗8# ⊗*
Aside from a total change of domain, what kinds of activities does AM lack
(e.g., proof capabilities)? Are any discoveries (e.g., analytic function theory)
clearly beyond its design limitations?
.E
Finally, all the conclusions will be gathered together, and a short summary of
this project's `contribution to knowledge' will be tolerated.
.END
.EVALU: SECNUM ;
.SSEC(Judging Performance)
One may view AM's activity as a progression from an initial core of knowledge
to a more sophisticated
.if false then start;
"final"$$ As has been stressed before, AM has no
fixed goal, no "final" state. For practical purposes, however, the totality of
explorations by AM is about the same as the "best run so far"; either of these can be
thought of as defining what is meant by the "final" state of knowledge. $
.end;
body of concepts and their facets.
Then each of the following is a reasonable way to measure success, to "judge" AM:
.BN SKIP 1; TURN ON "{⎇";
λλ By AM's ultimate achievements. Examine the list of
concepts and methods AM developed.
Did AM ever discover anything interesting yet unknown to the user?$$
The "user" is a human who works with AM interactively, giving it hints, commands,
questions, etc.
Notice that by "new" we mean new to the user, not new to Mankind.
This might occur if the user were a child, and AM discovered
some elementary facts of arithmetic.
This is not really
so provincial: mathematicians take "new" to mean new to Mankind, not
new in the Universe. I feel philosophy slipping in, so this footnote is
terminated. $ Anything new to Mankind?
λλ By the character of the difference between the initial and final states.
Progressing from set theory to number theory is much more impressive than progressing
from two-dimensional geometry to three-dimensional geometry.
λλ By the quality of the route AM took to accomplish these advances:
How clever, how circuitous,
how many of the detours were quickly identified as such and abandoned?
.BREAK
λλ By the character of the User--System interactions: How important is the user's
guidance? How closely must he guide AM? What happens if he doesn't say anything ever?
When he does want to say something, is there an easy way to express that to AM,
and does AM respond well to it?
Given a reasonable kick in the right direction, can AM develop the mini-theories
which the user intended, or at least something equally interesting?
λλ By its intuitive heuristic powers: Does AM believe in "reasonable" conjectures?
How accurately does AM estimate the difficulty of tasks it
is considering?
Does AM tie together (e.g., as analogous) concepts which are formally unrelated
yet which benefit from such a tie?
λλ By the results of the experiments described in
Section {[2] EXPT⎇.{[2] EXPTSSEC⎇ (beginning on page {[3] EXPTPAGE⎇).
How "tuned" is the worth numbering scheme? The task priority rating scheme?
How fragile is the set of initial concepts and heuristic rules?
How domain-specific are those heuristics really? The set of facets?
λλ By the very fact that the kinds of experiments outlined in Section
{[2]EXPT⎇.{[3]EXPTSSEC⎇ can
easily be "set up" and performed on AM.
Regardless of the experiments' outcomes,
the features of AM which allow them to be carried
out at all are worthy of note.
λλ By the implications of this project. What can AM suggest about educating
young mathematicians (and scientists in general)?
What can AM say about doing math? about empirical research in general?
λλ By the number of new avenues for research and experimentation it opens up.
What new projects can we propose?
λλ By comparisons to other, similar systems.
.E
.ONCE TURN ON "{⎇"
For each of these {BNN⎇ measuring criteria,
a subsection will now be provided.
.if false then start;
, to illustrate (i) the biggest
achievement and (ii) the biggest failure of AM along each dimension, and
(iii) to
try to objectively characterize AM's performance according to that measure.
.end;
Other measures of judging performance exist$$
For example, Colby sent transcripts of a session with PARRY to various
psychiatrists, and had them evaluate each interaction along several dimensions.
The same kind of survey could be done for AM.
A quite separate measure of AM would be to wait and see how many
future articles in the field refer to this work (and in what light!).
$, of course, but haven't been
applied to AM.
. SSSEC(AM's Ultimate Discoveries)
Two of the ideas which AM proposed were totally new and unexpected:$$
Note that these are "ultimate discoveries" only in the sense of what has been done
at the time of writing this thesis. For one of AM's ideas to be "new",
it should be previously unknown to both the author and the user. Why?
If the author knew about it, then the heuristics he provided AM with
might unconsciously
encode a path to that knowledge. If the user knew about that idea, his
guidance might unconsciously help AM to derive it. An even more stringent
interpretation would be that the idea be hitherto unknown to the collective
written record of Mathematics. $
.MAXDIVPAGE: PAGE
.MAXDIVSEC: SECNUM
.MAXDIVSSEC: SSECNUM
.BN
λλ Consider numbers with an abnormally high number of divisors. If
d(n) represents the number of divisors of n,$$e.g., d(12) =
Size(α{1,2,3,4,6,12α⎇) = 6. $ then AM defines the set of
"maximally-divisible numbers" to be ⊗6α{nεN | (∀m<n) d(m)<d(n)α⎇⊗*.
By factoring each such number into primes, AM noticed a regularity in
them. The author then developed a "mini-theory" about these numbers.
It later turned out that Ramanujan had already proposed that very
same definition (in 1915), and had found that same regularity. His
results only partially overlap those of AM and the author, however, and
his methods are radically different.
λλ ⊗1AM found a cute geometric application of Goldbach's conjecture.
Given a set of all angles of prime degree, from 0 to 180↑o,$$Included
are 0↑o and 1↑o, as well as the "typical" primes 2↑o, 3↑o, 5↑o, 7↑o,
11↑o,..., 179↑o. $ then ⊗4any⊗* angle between 0 and 180 degrees can
be approximated to within 1↑o by adding a pair of angles from this
prime set. In fact, it is hard to find smaller sets than this one
which approximate any angle to that accuracy.
.E
By and large, the other concepts which AM developed were either
already-known, or real losers. For example,
AM composed Set-insert with the predicate Equality.
The result was an operation
Insert-o-Equal(x,y,z), which first tested whether x was Equal to y or
not. The value of this was either True or False.
.if false then start;
$$ Actually, in LISP,
it was easier to have such results always be either T or NIL$
.end;
Next,
this T/F value was inserted into z. For example,
Insert-o-Equal({1,2⎇,{3,4⎇,{5,6⎇) = {False,5,6⎇. The first two
arguments are not equal, so the atom `False' was inserted into the
third. Although hitherto "unknown", this operation would clearly be better off
left in that state.
Another kind of loser occurred whenever AM entered upon some
"regular" behavior. For example, if it decided that Compose was
interesting, it might try to create some examples of compositions. It
could do this by picking two operations and composing them. What
better operations to pick than Compose and Compose! Thus
Compose-o-Compose would be born. By composing that with itself, an
even more monstrous operation is spawned:
Compose-o-Compose-o-Compose-o-Compose. Since AM actually uses the
word "Compose" instead of that little infix circle, the PNAME of the
data structure it creates is horrendous. Its use is almost
nonexistent: it must take 5 operations as arguments, and it returns a
new operation which is the composition of those five.
An analogous danger which exists is for AM to be content conjecturing
a stream of very similar relationships (e.g., the multiplication table).
In all such cases, AM must have meta-rules which pull it up out of such
whirlpools, to perceive a higher generalization of its previous sequence
of related activities.
In summary, then, we may say that AM produced a few winning ideas new to the
author, a couple of which were new to Mankind.
Several additional "new" concepts were
created which both AM and the user agreed were better forgotten.
.if false then start;
AM and its descendents will remain merely ⊗4tools⊗* -- intelligent
scientific instruments -- assisting human scientists.
major, although this is deceptive since AM lacks the ⊗4breadth⊗* of
abilities any human being possesses.
.end;
. SSSEC(The Magnitude of AM's Progress)
.QQ
Even with men of genius, with whom the birth rate of hypotheses
is very high, it only just manages to exceed the death rate.
-- W. H. George$$ Quoted from [Beveridge 50]. $
.ESS
We can ask the following kind of question: how many "levels" did AM
ascend? This is a fuzzy notion, but basically we shall say
that a new level is reached when a valuable new bunch of connected
concepts are defined in terms of concepts at a lower level.
For example, AM started out knowing about Sets and Set-operations.
When it progressed to numbers and arithmetic, that was one big step
up to a new level. When it zeroed in on primes, unique-factorization,
and divisibility, it had moved up another level.
<<in mid last para: new, p2, SOMEWHERE>
When fed simple geometry concepts, AM moved up one level when it
defined some generalizations of the equality of geometric figures
(parallel lines, congruent and similar triangles, angles equal in
measure) and their invariants (rotations, translations, reflections).
The above few examples are unfortunately exhaustive: that just about
sums up the major advances AM made. Its progress was halted not so
much by cpu time and space, as by a paucity of meta-knowledge:
heuristic rules for filling in new heuristic rules. Thus AM's
successes are finite, and its failures infinite, along this
dimension.
A more charitable view might compare AM to a human who was forced to
start from set theory, with AM's sparse abilities. In that sense,
perhaps, AM would rate quite well. The "unfair" advantage it had was
the presence of many heuristics which themselves were gleaned from
mathematicians: i.e., they are like compiled hindsight. A major
purpose of mathematics education in the university is to instill
these heuristics into the minds of the students.
AM is thus characterized as possessing heuristics which are powerful
enough to take it a few "levels" away from the kind of knowledge it
began with, but ⊗4only⊗* a few levels. The limiting factors are (i)
the heuristic rules AM begins with, and more specifically (ii) the
expertise in recognizing and compiling new heuristics, and more
generally (iii) a lack of real-world situations to draw upon for
analogies, intuitions, and applications.
. SSSEC(The Quality of AM's Route)
.QQ
Thinking is not measured by what is produced, but rather is a property of
the way something is done.
-- Hamming
.ESS
No matter what its achievements were, or the magnitude of its
advancement from initial knowledge, AM could$$
not necessarily WOULD be so judged. Humans may very well consider an incredible
number of silly ideas before the right pair of "hooked atoms" collide into a
sensible thought, which is then considered in full consciousness.
If, like humans, AM was capable of doing this processing in a sufficiently brief period of
real time, it would not reflect ill on its evaluation. Of course, this may simply
be the DEFINITION of "sufficiently brief".
$ still be judged
"unintelligent" if, e.g., it were exploring vast numbers of absurd
avenues for each worthwhile one it found. The quality of the route AM
followed is thus quite significant.
.if false then start;
AM performed better in this respect than expected. It is not obvious$$
Or whether that even makes sense to consider.
Comparisons with mathematicians would be desirable, but are beyond the scope
of this investigation.
$ how well a
human would have fared under similar circumstances.
.end;
Of the two hundred
new concepts it defined, about 130 were acceptable -- in the sense
that one can defend AM's reasoning in at least exploring them; in the
sense that a human mathematician might have considered them. Of these
"winners", about two dozen were significant -- that is, useful,
catalytic, well-known by human mathematicians, etc. Unfortunately,
the sixty or seventy concepts which were losers were ⊗4real⊗* losers.
.if false then start;
In this respect, AM fell far below the standards a mathematician
would set for acceptable behavior: ⊗4all⊗* his failures should have
at least seemed promising at the beginning.
Half of AM's adventures
were poorly grounded, and (perhaps due to a lack of intuition) AM
bothered with concepts which were "obviously" trivial: the set of
even primes, the set of numbers with only one divisor, etc. The
human mathematician would momentarily consider many poor courses of
action, whereas AM on the other hand managed to avoid truly lunatic
activities without even momentary consideration of them, But a human
would only spend a significant amount of time on very promising
tasks, and AM wasted a huge amount of time on tasks which a human
would have quickly recognized as dead-ends.
.end;
Once again we must observe that the quality of the route is a
function of the quality of the heuristics.
.if false then start;
If there are many clever
little rules, then the steps AM takes will often seem clever and
sophisticated. If the rules superimpose nicely, joining together to
collectively buttress some specific activity, then their
effectiveness may surprise -- and surpass -- their creator.
Such moments of great insight (i.e., where AM's reasoning surpassed
mine) did occur, although rarely. Both of AM's "big discoveries"
started by its examining concepts I felt weren't really interesting.
For example, I didn't like AM spending so much time worrying about
numbers with many divisors; I "knew" that the converse concept of
primes was infinitely more valuable. And yet AM saw no reason to give
up on maximally-divisible numbers; it had several good reasons for
continuing that inquiry (they were the converse to primes which had
already proved interesting, their frequency within the integers was
neither very high nor very low nor very regular, their definition was
simple, they were extremals of the interesting operation
"Divisors-of", etc., etc.) Similarly, I "knew" that Goldbach's
conjecture was useless, so I was unhappy that AM was bothering to try
to apply it in the domain of geometry. In both cases, AM's reasons
for its actions were unassailable, and in fact it did discover some
interesting new ideas both times.
Sometimes AM's behavior was displeasing, even though it wasn't
"erring". Occasionally it was simultaneously developing two
mini-theories (say primes and maximally-divisibles). Then it might
pick a task or two dealing with one of these topics, then a task or
two dealing with the other topic, etc. The task picked at each moment
would be the one with the highest priority value. As a theory is
developed, the interestingness of its associated tasks go up and
down; there may be doldrums for a bit, just before falling into the
track that will lead to the discovery of a valuable relationship.
During these temporary lags, the interest value of tasks related to
the other theory's concepts will appear to have a higher priority
value: i.e., better reasons supporting it. So AM would then skip over
to one of ⊗4those⊗* concepts, develop it until ⊗4its⊗* doldrums, then
return to the first one, etc. Most humans found this behavior
unpalatable$$ Although it might be the "best" from a dynamic
management point of view, it probably would be wrong in the long run.
Major advances really do have lulls in their development. $ because
AM had no compunction about skipping from one topic to another.
Humans have to retune their minds to do this skipping, and therefore
treat it much more seriously. For that reason, AM was given an extra
mobile reason to use for certain tasks on its agenda: "focus of
attention". Any task with the same kind of topic as the ones just
executed are given this extra reason, and it raises their priority
values a little. This was enough sometimes to keep AM working on a
certain mini-theory when it otherwise would have skipped somewhere
else.
The above "defect" is a cute little kind of behavior AM exhibited
which was non-human but not clearly "wrong". There were ⊗4genuine⊗*
bad moments also, of course. For example, AM became very excited$$
Please excuse this anthropomorphism. Technically, we may say that
the priority value of the best job on the agenda is the "level of
excitement" of AM. 700 or higher is called "excitement", on a scale of 0-1000.
$ when the conjunction of "empty-set" and other concepts
kept being equal to empty-set. AM kept repeating conjunctions of this
form, rather than stepping back and generalizing this data into a
(phenomenological) conjecture. Similar blind looping behavior
occurred when AM kept composing Compose with itself, over and over.
In general, one could say that "regular" behavior of any kind signals
a probable fiasco. A heuristic rule to this effect halted most of
these disgraceful antics. This rule had to be careful, since it was
almost the antithesis of the "focus of attention" idea mentioned in
the preceding paragraph. Together, those two rules seem to say that
you should continue on with the kind of thing you were just doing,
but not for ⊗4too⊗* long a time.
The moments of insight were 2 in number; the moments of stupid
misdirection were about twenty times as many.
AM has very few heuristics for deciding that something was
⊗4un⊗*interesting, that work on it should halt for a long time.
Rather, AM simply won't have anything positive to say about that
concept, and other concepts are explored instead, essentially by
default. Each concept has a worth component which corresponded to
its right to life (its right to occupy storage in core). This number
slowly declines with time, and is raised whenever something
interesting happens with that concept. If it ever falls below a
certain threshhold, and if space is exhausted$$ No concepts were
forgotten in this way until near the end of AM's runs, when AM would
usually collapse from several causes including lack of space. $,
then the concept is forgotten: its list cells are garbage collected,
and all references to it are erased, save those which will keep it
from being re-created. This again is not purposeful forgetting, but
rather by default; not because X is seen as a dead-end, but simply
because other concepts seem so much more interesting for a long time.
Thus
.end;
AM did not develop the sixty "losers" very much: they ended up
with an average of only 1.5 tasks relevant to them ever having been
chosen. The "winners" averaged about twice as many tasks which
helped fill them out more. Also, the worth ratings of the losers
ended up far below those of the winners in all but two cases.
.if false then start;
. So AM really did judge the
value of its new concepts quite well.
The final aspect of this important dimension of evaluation is the
quality of the reasons AM used to support each task it chose to work
on. Again, the English phrases corresponded quite nicely to the
"real" reasons a human might give to justify why something was worth
trying, and the ordering of the tasks on the agenda was rarely far
off from the one that I would have picked myself. This was perhaps
AM's greatest success: the rationality of its actions.
.end;
. SSSEC(The Character of the User-System Interactions)
AM is not a "user-oriented" system. There were many nice
human-interaction features in the original grandiose proposal for AM
which never got off the drawing board. At the heart of these features
were two assumptions:
.BN
λλ The user must understand AM, and AM must likewise have a good
model of the particular human using AM. The only time either should
initiate a message is when his model of the other is not what he
wants that model to be. In that case, the message should be
specifically designed to fix that discrepancy.$$ This idea was
motivated by a lecture given in 1975 by Terry Winograd$
λλ Each kind of message which is to pass between AM and its user
should have its own appropriate language. Thus there should be a
terse comment language, whereby the user can note how he feels about
what AM is doing, a questioning language for either party to get/give
reasons to the other, a picture language for communicating certain
relationships, etc.
.E
Neither of these ideas ever made it into the LISP code that is now
AM, although they are certainly not prohibited in any way by AM's
design.
.if false then start;
It would be a separate project, at or above the level of a
master's thesis, for someone to build a nice user interface for AM$$
I am not actually calling for this to be done, merely indicating the
magnitude of the effort involved. A VERY nice user interface might be
much harder, at the level of a dissertation. $.
.end;
As one might expect, the reason for this atrophy is simply because
very little guidance from the user was needed by AM. In fact, all
the discoveries, cpu time quotes, etc. mentioned in this document
are taken from totally unguided runs by AM. If the user guides as
well as he can, then about a factor of 2 or 3 speedup is possible. Of
course, this assumes that the user is dragging AM directly along a
line of development he knows will be successful. The user's "reasons"
at each step are based essentially on hindsight. Thus this is not at
all "fair". If AM ever becomes more user-oriented, it would be nice
to let children (say 6-12 years old) experiment with it, to observe
them working with it in domains unfamiliar to either of them.$$
Starred (*) exercise for the reader: carry out such a project on a
statistically significant sample of children, wait thirty years, and
observe the incidence of mathematicians and scientists in general,
compared to the national averages. Within whatever occupation they've
chosen, rate their creativity and productivity. $
.ONCE TURN ON "{⎇"
The user can "kick" AM in one direction or another, e.g., by
interrupting and telling AM that Sets are more interesting than
Numbers.
.if false then start;
$$ To actually do this, the user will type control-I to
interrupt AM. He then types I, meaning "alter the interest of",
followed by the word "Sets". AM then asks whether this is to be
raised or lowered. He types back R, and AM asks how much, on a 1-10
scale. He replies 9, say, and then repeats this process for the
concept "Numbers". $
.end;
Even in that particular case, AM fails to
develop any higher-level set concepts (diagonalization, infinite
sets, etc.) and simply wallows around in finite set theory (de
Morgan's laws, associativity of Union, etc.). When geometric
concepts are input, and AM is kicked in ⊗4that⊗* direction, much
nicer results are obtained. See the report on the
Geometry experiment, page {[3] GEOEX⎇.
There is one important result to observe: the very best examples of
AM in action were brought to full fruition only by a human developer.
That is, AM thought of a couple great concepts, but couldn't develop
them well on its own. A human (the author) then took them and worked
on them by hand, and interesting results were achieved. These results
could be told to AM, who could then go off and look for new concepts
to investigate. This interaction is of course at a much lower
frequency than the kind of rapidfire question/answering talked about
above. Yet it seems that such synergy may be the ultimate mode of
AM-like systems.
. SSSEC(AM's Intuitive Powers)
.QQ
Intuitive conviction surpasses logic as the brilliance of the sun
surpasses the pale light of the moon.
-- Kline
.ESS
Let me hasten to mention that the word "intuitive" in this
subsection's title is not related to the (currently non-existent)
"Intuitions" facets of the concepts. What is meant is the totality
of plausible reasoning which AM engages in: empirical induction,
generalization, specialization, maintaining reasons for jobs on the
agenda list, creation of analogies between bunches of concepts, etc.
AM only considers conjectures which have been explicitly suggested:
either by empirical evidence, by analogy, or (de-implemented now:) by
Intuition facets. Once a conjecture has been formulated, it is
tested in all ways possible: new experimental evidence is sought
(especially extreme cases), it is examined formally$$ Currently, this
is done in trivial ways. An open problem, which is under attack now,
is to add more powerful formal reasoning abilities to AM. $ to see
if it follows from already-known conjectures, etc.
Because of this grounding in plausibility, the only conjectures the
user ever sees (the ones AM is testing) are quite believable. If
they turn out to be false, both he and AM are surprised. For
example, both AM and the user were disappointed when nothing came out
of the concept of Uniquely-prime-addable numbers (positive integers
which can be represented as the sum of two primes in precisely one
way). Several conjectures were proposed via analogy with unique
prime factorization, but none of them held experimentally. Each of
them seemed worth investigating, to both the user and the system.
It is still not known whether there is anything interesting about that concept
or not.
.if false then start;
AM's estimates of the value of each task it attempts were often far
off from what hindsight proved their true values to be. Yet this was
not so different from the situation a real researcher faces, and it
made little difference on the discoveries and failures of the system.
AM occasionally mismanaged its resources due to errors in these
estimates. To correct for such erroneous prejudgments, heuristic
rules were permitted to dynamically alter the time/space quanta for
the current task. If some interesting new result turned up, then some
extra resources would be allotted. If certain heuristics failed, they
could reduce the time limits, so not as much total cpu time would be
wasted on this loser.
An example of a nice conjecture is the unique factorization one. A
nice analogy was the one between angles and numbers (leading to the
application of Goldbach's conjecture). Another nice analogy was
between numbers and bags (and hence between bag-operations and what
we commonly call arithmetic operations).
Some poor analogies were considered, like the one between bags and
singleton-bags. The ramifications of this analogy were painfully
trivial$$ The bag-operations, applied to singletons, did not produce
singletons as their result: (x)∪(y) is (x,y) which is not a
singleton. Whether they did or not depended only on the equality or
inequality of the two arguments. There were many tiny conjectures
proposed which merely re-echoed this general conclusion. $.
.end;
. SSSEC(Experiments on AM)
.ONCE TURN ON "{⎇"
The experiments described in Section {[2]EXPT⎇.{[2]EXPTSSEC⎇ (page
{[3]EXPTPAGE⎇ ff) provide some results relevant to the overall value
of the AM system. The reader should consult that section for
details; neither the experiments nor their results will be repeated
here. A few conclusions will be summarized:
The worth-numbering scheme for the concepts is fairly robust: even
when all the concepts's worths are initialized at the same value, the
performance of AM doesn't collapse, although it is noticeably
degraded.
Certain mutilations of the priority-value scheme for tasks on the agenda will
cripple AM, but it can resist most of the small changes tried in
various experiments.
Sometimes, removing just a single concept (e.g., Equality) was
enough to block AM from discovering some valuable concepts it
otherwise got (in this case, Numbers). This makes AM's behavior sound
very fragile, like a slender chain of advancement. But on the other
hand, many concepts (e.g., TIMES, Timberline, Primes$$
Primes was discovered independently as follows: all numbers (>0) were seen
to be representable as the sum of smaller numbers; Add was known to be
analogous to TIMES; But not all numbers (>1) appeared to be representable as the
↓_product_↓ of two smaller ones; Rule number {[3]PRIM2⎇ triggered (see Appendix
{[2]ALLHEU⎇, page {[3]PRIM2P⎇), and AM defined the set of exceptions: the set
of numbers which could not be expressed as the product of two smaller ones; i.e.,
the primes. $) were discovered in
several independent ways. If AM's behavior is a chain, it is
multiply-stranded (except for a few weak spots, like Numbers).
The heuristics are specific to their stated domain of applicability.
Thus when working in geometry, the Operation heuristics were just as
useful as they were when AM worked in elementary set theory or number
theory. The set of facets seemed adequate for those domains, too. The
Intuition facet, which was rejected as a valid source of information
about sets and numbers, might have been more acceptable in geometry
(e.g., something similar to Gelernter's model of a geometric
situation).
.if false then start;
All in all, then, we conclude that AM was fairly tough, and about as
general as its heuristics claimed it was. AM is not invincible,
infallible, or universal. Its strength lies in careful use of
heuristics. If there aren't enough domain-specific heuristics around,
the system will simply not perform well in that domain. If the
heuristic-using control structure of AM is tampered with$$ e.g.,
treat all reasons as equivalent, so you just COUNT the number of
reasons a task has, to determine its priority on the agenda.$, there
is some chance of losing vital guiding information which the
heuristics would otherwise supply.
.end;
. SSSEC(How to Perform Experiments on AM)
.ONCE TURN ON "{⎇"
The very fact that the kinds of experiments mentioned in the last
section (and described in detail in Section {[2]EXPT⎇.{[2]EXPTSSEC⎇)
can be "set up" and performed on AM, reflects an important quality
of the AM program.
Most of those experiments took only a matter of minutes to set up, only a
few tiny modifications to AM. For example, the one where all the
Worth ratings were initialized to the same value was done by
evaluating the single LISP expression:
.B816
(MAPC CONCEPTS '(λ (c) (PUT c 'Worth 200)))
.E
Similarly, here is how AM was modified to treat all tasks as if they
had equal value: the function Pick-task has a statement of the form
.B816
(SETQ Current-task (First-member-of Agenda))
.E
All that was necessary was to replace the call on the function
"⊗6First-member-of⊗*"
by the function "⊗6Random-member-of⊗*".
Even the most sophisticated experiment, the introduction of a new
bunch of concepts -- those dealing with geometric notions like
Between, Angle, Line -- took only a day of conscious work to
set up.
Of course ⊗4running⊗* the experiment involves the expenditure of hours of
cpu time.
.if false then start;
, so only a limited number were actually performed.$$ Those described
in the last chapter. The series of experiments began at the same time that
this document was being written, and was intended originally only as a diversion
from the tedium of writing. The interesting character of their results
convinced me they should be included, even though they are few in number and
quite incomplete. $
.end;
There are certain experiments one can't easily perform on AM:
removing all its heuristics, for example. Most heuristic search
programs would then wallow around, displaying just how big their
search space really was. But AM would just sit there, since it'd have
nothing plausible to do.
Many other experiments, while cute and easy to set up, are quite
costly in terms of cpu time. For example, the class of experiments of
the form: "remove heuristics x, y, and z, and observe the resultant
affect on AM's behavior". This observation would entail running AM
for an hour or two of cpu time! Considering the number of subsets of
heuristics, not all these questions are going to get answered in our
universe's lifetime. Considering the small probable payoff from any
one such experiment, very few should actually be attempted.
One nice experiment would be to monitor the contribution each heuristic
is making. That is, record each time it is used and record the final outcome
of its activation (which may be several cycles later).
Unfortunately,
AM's heuristics are not all coded as separate Lisp entities, which one could
then "trace". Rather, they are often interwoven with each other into large
program pieces. So this experiment can't be easily set up and run on AM.
Even those experiments which can be run, can be set up
only by someone familiar with the LISP code of AM. It would be
quite hard to modify AM so that the untrained user could easily
perform these experiments. Essentially, that would demand that AM
have a deep understanding of its own structure. This is of course
desirable, fascinating, challenging, but wasn't part of the design of
AM.$$ A general suggestion for future research projects in this area:
such systems should be designed in a way which facilitates a
poorly-trained user not only ↓_using_↓ the system but
↓_experimenting_↓ on it. $
. SSSEC(Future Implications of this Project)
One harsh measure of AM would be to demand what possible applications
it will have. This really means (i) the uses for the AM system, (ii)
the uses for the ideas of how to create such systems, (iii)
conclusions about math and science one can draw from experiments with
AM.
Here are some of these implications, both real and potential:
<<MAYBE: Separate into 3 lists: definite/probable/potential
applications.>
.BN PREFACE 1 INDENT 0,3,0
λλ New tools for computer scientists who want to create large
knowledge-based systems to emulate some creative human activity.
.BEGIN PREFACE 0 INDENT 8
⊗61a.⊗* The modular representation of knowledge that AM uses might
prove to be effective in any knowledge-based system. Division of a
global problem into a multitude of small chunks, each of them of the
form of setting up one quite local "expert" on some concept, is a
nice way to make a hard task more managable. Conceivably, each
needed expert could be filled in by a human who really is an expert
on that topic. Then the global abilities of the system would be able
to rely on quite sophisticated local criteria. Fixing a set of
facets once and for all permits effective inter-module communication.
⊗61b.⊗* Some ideas may carry over unchanged into many fields of human
creativity, wherever local guiding rules exist. These include: (a)
ideas about heuristics having domains of applicability, (b) the
policy of tacking them onto the most general knowledge source
(concept, module) they are relevant to, (c) the rippling scheme to
locate relevant knowledge, etc.,
.E
λλ A body of heuristics which can be built upon by others.
.BEGIN PREFACE 0 INDENT 8
⊗62a.⊗* Most of the particular heuristic judgmental criteria for
interestingness, utility, etc., might be valid in developing
theorizers in other sciences. Recall that each rule has its domain
of applicability; many of the heuristics in AM are quite general.
⊗62b.⊗* Just within the small domain in which AM already works, this
base of heuristics might be enlarged through contact with various
mathematicians. If they are willing to introspect and add some of
their "rules" to AM's existing base, it might gradually grow more and
more powerful.
⊗62c.⊗* Carrying this last point to the limit of possibility, one
might imagine the program possessing more heuristics than any single
human. Of course, AM as it stands now is missing so much of the
`human element', the life experiences that a mathematician draws upon
continually for inspiration, that merely amassing more heuristics
won't automatically push it to the level of a super-human
intelligence. Another far-out scenario is that of the great
mathematicians of each generation pouring their individual heuristics
into an AM-like system. After a few generations have come and gone,
running that program could be a valuable way to bring about
`interactions' between people who were not contemporaries.
.E
λλ New and better strategies for math educators.
.BEGIN PREFACE 0 INDENT 8
⊗63a.⊗* Since the key to AM's success seems to be its heuristics, and
not the particular concepts it knows, the whole orientation of
mathematics education should perhaps be modified, to provide experiences for
the student which will build up these rules in his mind. Learning a
new theorem is worth much less than learning a new heuristic which
lets you discover new theorems.$$ Usually. One kind of exception is
the following: the ability to take a powerful theorem, and extract
from it a new, powerful heuristic. AM cannot do this, but it may turn
out that this mechanism is quite crucial for humans' obtaining new
heuristics. This is another open research problem. $ I am
far from the first to urge such a revision (see, e.g., [Koestler 67], p.265,
or see [Papert 72]).
⊗63b.⊗* If the repertoire of intuition (simulated real-world
scenarios) were sufficient for AM to develop elementary concepts of
math, then educators should ensure that children (4-6 years old) are
thoroughly exposed to those scenarios. Such activities would include
seesaws, slides, piling marbles into pans of a balance scale,
comparing the heights of towers built out of cubical blocks, solving
a jigsaw puzzle, etc. Unfortunately, AM failed to show the value of
these few scenarios. This was a potential application which was not
confirmed.
⊗63c.⊗* One use for AM itself would be as a "fun" teaching tool. If a
very nice user interface
is constructed, AM could serve as a model for, say,
college freshmen with no math research experience. They could watch
AM, see the kinds of things it does, play with it, and perhaps get a
real flavor for (and get turned on by) doing math research. A vast
number of brilliant minds are too turned off by high-school drilling
and college calculus to stick around long enough to find out how
exciting -- and different -- research math is compared to textbook
math.
.E
λλ Further experiments on AM might tell us something about how the
theory formation task changes as a theory grows in sophistication.
For example, can the same methods which lead AM from premathematical
concepts to arithmetic also lead AM from number systems up to
abstract algebra? Or are a new set of heuristic rules or extra
concepts required? My guess is that a few of each are lacking
currently, but ⊗4only⊗* a few. There is a great deal of disagreement
about this subject among mathematicians. By tracing along the
development of mathematics, one might categorize discoveries by how
easy they would be for an AM-like system to find. Sometimes, a
discovery required the invention of a brand new heuristic rule, which
would clearly be beyond AM as currently designed. Sometimes,
discovery is based on the lucky random combination of existing
concepts, for no good ⊗4a priori⊗* reason. It would be instructive to
find out how often this is necessarily the case: how often ⊗4can't⊗*
a mathematical discovery be motivated and "explained" using heuristic
rules of the kind AM possesses?
λλ An unanticipated result was the creation of new-to-Mankind math
(both directly and by defining new, interesting concepts to
investigate by hand). The amount of new bits of mathematics
developed to date is minuscule.
.BEGIN PREFACE 0 INDENT 8
⊗65a.⊗* As described in (2c) above, AM might absorb heuristics from
several individuals and thereby integrate their particular insights.
This might eventually result in new mathematics being discovered.
⊗65b.⊗* An even more exciting prospect, which never materialized, was
that AM would find a new redivision of existing concepts, an
alternate formulation of some established theory, much like
Hamiltonian mechanics is an alternate unification of the data which
led to Newtonian mechanics. The only rudimentary behavior along
these lines was when AM occasionally derived a familiar concept in an
abnormal way (e.g., TIMES was derived in four ways; Prime pairs were
noticed by restricting Addition to primes).
.END
.E
. SSSEC(Open Problemsα: Suggestions for Future Research)
While AM can and should stand as a complete research project, part of
its value will stem from whatever future studies are sparked by it.
Of course the "evaluation" of AM along this dimension must wait for
years, but even at the present time several such open problems come
to mind:
.BN; PREFACE 1; AT ",," ⊂ "⊗8# ⊗*" ⊃; TURN ON "{⎇";
,, Devise Meta-heuristics, rules capable of operating on and
synthesizing new heuristic rules. AM has shown the solution of this
problem to be both nontrivial and indispensable. AM's progress
ground to a halt because fresh, powerful heuristics were never
produced. The next point suggests that the same need for new rules
exists in mathematics as a whole:
,, Examine the history of mathematics, and gradually build up a list
of the heuristic rules used. Does the following thesis have any
validity: "⊗4The development of mathematics is essentially the
development of new heuristics⊗*." That is, can we `factor out' all
the discoveries reachable by the set of heuristics available (known)
to the mathematicians at some time in history, and then explain each
new big discovery as requiring the synthesis of a brand new
heuristic? For example, Bolyai and Lobachevsky did this a century
ago when they decided that counter-intuitive systems might still be
consistent and interesting. Non-Euclidean geometry resulted, and no
mathematician today would think twice about using the heuristic they
developed. Einstein invented a new heuristic more recently, when he
dared to consider that counter-intuitive systems might actually have
physical reality.$$ As Courant says, "When Einstein tried to reduce
the notion of `simultaneous events occurring at different places' to
observable phenomena, when he unmasked as a metaphysical prejudice
the belief that this concept must have a scientific meaning in
itself, he had found the key to his theory of relativity." $ What was
once a bold new method is now a standard tool in theoretical physics.
,, In a far less dramatic vein, a hard open problem is that of
building up a body of rules for symbolically instantiating a
definition (a LISP predicate). These rules may be structured
hierarchically, so that rules specific to operating on `operations
whose domain and range are equal' may be gathered. Is this set
finite and managable; i.e., does some sort of "closure" occur after
a few hundred (thousand?) such rules are assembled?
,, More generally, we can ask for the expansion of all the heuristic
rules, of all categories. This may be done by eliciting them from
mathematicians, or automatically by the application of very
sophisticated meta-heuristics. Some categories of rules include: how
to generalize/specialize definitions, how to find examples of a given
concept, how to optimize LISP algorithms.
,, Experiments can be done on AM. A few have been performed already,
many more are proposed in Section {[2]RESULT⎇.{[3]EXPTSSEC⎇, and no
doubt some additional ones have already occurred to the reader.
,, Extend the analysis already begun (see p. {[3]SANA⎇)
of the set of heuristics AM
possesses.
One reason for such an analysis would be to achieve a better understanding
of the contribution of the heuristics.
In some sense, the heuristics and the choice of starting concepts "encode"
the discoveries which AM makes, and the way it makes them. A better understanding
of that encoding may lead to new ideas for AM and for future AM-like systems.
,, Rewrite AM. In Chapter {[2]OVERV⎇, on page {[3]HSEARCH⎇, it was pointed
out that there are two common species of heuristic search programs.
One type has a legal move generator, and heuristics to constrain it.
The second type, including AM, has only a set of heuristics, and they act as
plausible move generators. Since AM seemed to create new concepts,
propose new conjectures, and formulate new tasks
in a very few
distinct ways, it might very well be feasible to find a purely syntactic
"legal move generator" for AM, and to convert each existing heuristic into
a form of constraint. In that case, one could, e.g., remove all the heuristics
and still see a meaningful (if explosive) activity proceed. There might be a few
surprises down that path.
,, A more tractible project, a subset of the former one, would be to recode
just the conjecture-finding heuristics as constraints on a new, purely syntactic
"legal conjecture generator". A simple Generate-and-Test paradigm would be used to
synthesize and examine large numbers of conjectures.
Again, removing all the heuristics
would be a worthwhile experiment.
,, At the reaches of feasability, one can imagine trying to extend AM into more and
more fields, into less-formalizable domains. International politics
has already been suggested as a very hard future applications area.
,, Abstracting that last point, try to build up a set of criteria
which make a domain ripe for automating (e.g., it possesses a strong
theory, it is knowledge-rich (many heuristics exist), the performance
of the professionals/experts is much better than that of the typical
practitioners, the new discoveries in that field all fall into a
small variety of syntactic formats,...?). Initially, this study
might help humans build better and more appropriate scientific
discovery programs. Someday, it might even permit the creation of an
automatic-theory-formation-program-writer.
,, The interaction between AM and the user is minimal and painful.
Is there a more effective language for communication? Should several
languages exist, depending on the type of message to be sent
(pictures, control characters, a subset of natural language,
induction from examples, etc.)? Can AM's output be raised in
sophistication by introducing an internal model of the user and his
state of knowledge at each moment? Perhaps AM should also contain
models of several known ⊗4groups⊗* of users (mathematicians,
computer scientists, students) -- say, stored as concepts in a
specialization/generalization hierarchy.
,, Human protocol studies may be appropriate, to test out the model
of mathematical research which AM puts forward. Are the sequences of
actions similar? Are the mistakes analogous? Do the pauses which the
humans emit ⊗4quantitatively⊗* correspond to AM's periods of gathering
and running `Suggest' heuristics?
,, Can the idea of Intuition functions be developed into a useful
mechanism? If not, how else might real-world experiences be made
available to an automated researcher to draw upon (for analogies, to
base new theories upon)? Could one interface physical effectors and
receptors and quite literally allow the program to `play around in
the real world' for his analogies?
,, Most of the `future implications' discussed in the last section
suggest future activities (e.g., new educational experiments and
techniques).
,, Most of the `limiting assumptions' discussed in a later section
(page {[3]LIMASS⎇) can be tackled with today's techniques (plus a
great deal of effort). Thus each of them counts as an open problem
for research.
,, Perform an information-theoretic analysis on AM. What is the value
of each heuristic? the new information content of each new conjecture?
,, If you're interested in natural language, the very hard problem exists of giving
AM (or a similar system) the ability to really do inferential processing
on the ⊗4reasons⊗* attached to tasks on the agenda. Instead of just being
able to test for equality of two reasons, it would be much more intelligent to
be able to infer the kind of relationship between any two reasons; if they overlap
semantically, we'd like to be able to compute precisely
how that should that effect the overall rating for the task; etc.
.NEWT3: PAGE;
,, Modify the control structure of AM, as follows. Allow mini-goals to exist,
and supply new rules for setting them up
(plausible goal generators)
and altering those goals, plus some new rules
and algorithms for satisfying them. The modification I have in mind would result
in new tasks being proposed because of certain current goals, and existing tasks
would be reordered so as to raise the chance of satisfying some important goal.
Finally, the human watching AM would be able to observe the rationality
(hopefully) of the goals which were set. The simple "Focus of Attention"
mechanism already in AM is a tiny step in this goal-oriented direction.
Note that this proposal itself demonstrates that AM is not inherently opposed
to a goal-directed control structure. Rather, AM simply possesses only a partial set of
mechanisms for complete reasoning about its domain.
.E
. SSSEC(Comparison to Other Systems)
One popular way to judge a system is to compare it to other, similar
systems, and/or to others' proposed criteria for such systems. There
is no other project (known to the author)
having the same objective: automated math research.$$
In
[Atkin & Birch 1971], e.g.,
we find no mention of the computer except as a number cruncher. $
Many somewhat related efforts have been reported
in the literature
and will be mentioned here.
Several projects have been undertaken which overlap small pieces of
the AM system and in addition concentrate deeply upon some area
⊗4not⊗* present in AM. For example,
the CLET system [Badre 73]
worked on learning the decimal addition algorithm$$ Given the
addition table up to 10 + 10, plus an English text description of
what it means to carry, how and when to carry, etc., actually write a
program capable of adding two 3-digit numbers $ but the
"⊗4mathematics discovery⊗*" aspects of that system were neither
emphasized nor worth emphasizing; it was an interesting natural
language communication study. The same comment applies to several
related studies by IMSSS$$See [Smith 74a], for example.$.
Boyer and Moore's theorem-prover
[Boyer&Moore 75]
embodies
some of the spirit of AM (e.g., generalizing the definition of a LISP
function), but its motivations are quite different, its knowledge
base is minimal, and its methods purely formal.$$ This is not meant
as criticism; considering the goals of those researchers, and the age
of that system, their work is quite significant. $ The same comments
apply to the SAM program [Guard 69], in which a resolution theorem-prover is set to
work on unsolved problems in lattice theory.
.ONCE TURN ON "{⎇";
Among the attempts to incorporate heuristic knowledge into a theorem
prover, we should also mention [Wang 60],
[Pitrat 70],
[Bledsoe 71], and [Brotz 74].
How did AM differ from these "heuristic theorem-provers"?
The goal-driven control structure of these systems is a real but only minor
difference from AM's control structure
(e.g., AM's "focus of attention" is a rudimentary step in that direction; see
p. {[3]NEWT3⎇).
The fact that their overall activity is typically labelled as deductive
is also not a fundamental distinction
(since constructing a proof is usually in practice quite ↓_⊗4in⊗*_↓ductive).
Even the character of the inference processes are analogous: The provers typically
contain a couple binary inference rules, like Modus Ponens, which are relatively
risky to
apply but can yield big results; AM's few "binary"
operators have the same characteristics: Compose, Canonize, Logically-combine
(disjoin and conjoin).
The main distinction is that
the theorem provers each incorporate only
a handful of heuristics. The reason for this, in turn, is the paucity of
good heuristics which exist for the very general task environment in which they
operate: domain-independent (asemantic) predicate calculus theorem proving.
The need for additional guidance was recognized by these researchers.
For example, see [Wang 60], p. 3 and p. 17. Or as
Bledsoe says$$ [Bledsoe 71], p. 73 $:
.BEGIN FILL INDENT 5,5,5 SELECT 3
There is a real difference between ⊗4doing⊗* some mathematics and
⊗4being⊗* a mathematician. The difference is principally one of
judgment: in the selection of a problem (theorem to be proved); in
determining its relevance;... It is precisely in these areas that
machine provers have been so lacking. This kind of judgment has to be
supplied by the user... Thus a crucial part of the resolution proof
is the ⊗4selection⊗* of the reference theorems by the ⊗4human⊗* user;
the human, by this one action, usually employs more skill than that
used by the computer in the proof.
.END
Many researchers have constructed programs which pioneered some of
the techniques AM uses.
[Gelernter 63] reports
the use of prototypical examples as analogic models to guide search in
geometry, and [Bundy 73] employs models of "sticks" to help his program work with
natural numbers. The single heuristic of analogy was studied in
[Evans 68] and [Kling 71].
Brotz's program, [Brotz 74], uses this to propose useful
lemmata.
Theory formation systems in ⊗4any⊗* field have been few. Meta-Dendral
[Buchanan 74]
represents perhaps the best of these. Its task is to unify a body of mass
spectral data (examples of "proper" identifications of spectra) into
a small body of rules for making identifications. Thus even this
system is given a fixed task, a fixed set of data to find
regularities within. AM, however, must find its own data, and take
the responsibility for managing its own time, for not looking too
long at worthless data.$$
In case that wasn't clear:
Meta-Dendral has a fixed set of templates for rules which
it wishes to find, and a fixed vocabulary of mass spectral concepts which can
be plugged into those templates. AM also has only a few stock formats for conjectures,
but it selectively enlarges its vocabulary of math concepts. $
There has been much written about scientific theory formation
(e.g., [Hempel 52]), but very little of it is specific enough to be of immediate
use to AI researchers. A couple pointers to excellent discussions of this sort
are: [Fogel 66],
[Simon 73], and [Buchanan 75].
Also worth noting is a discussion near the end of [Amarel 69], in which
"formation" and "modelling" problems are treated:
.BEGIN FILL INDENT 5,5,5 SELECT 3; PREFACE 0;
The problem of model finding is related to the following general question
raised by Schutzenberger [In discussion at the Conference on Intelligence and
Intelligent Systems, Athens, Ga., 1967]: ⊗4`What do we want to do with intelligent
systems that relates to the work of mathematicians?'⊗*.
So far all we have done in this general area is to emulate some of the reasonably simple activities
of mathematicians, which is finding consequences from given assumptions,
reasoning, proving
theorems. A certain amount of work of this type was already done in the propositional and
predicate calculi, as well as in some other mathematical systems. But this is
only one aspect of the work that goes on in mathematics.
Another very important aspect is the one of finding general properties of structures, finding
analogies, similarities, isomorphisms, and so on.
This is the type of activity that is extremely important
for our understanding of model-finding mechanisms. Work in this
area is more difficult than theorem-proving. The problem here is that of
↓_theorem finding._↓
.E; ONCE PREFACE 0;
AM is one of the first attempts to construct a "theorem-finding" program.
As Amarel noted, it may be
possible to learn from such programs how to tackle the general task of automating
scientific research.
Besides "math systems", and "creative thinking systems", and "theory
formation systems", we should at least discuss others' thoughts on
the issue of algorithmically doing math research. Some individuals
feel it is not so far-fetched to imagine automating mathematical
research (e.g., Paul Cohen). Others (e.g., Polya) would probably
disagree. The presence of a high-speed, general-purpose
symbol manipulator
in our midst now makes investigation of that question
possible.
There has been very little published thought about discovery in
mathematics from an algorithmic point of view; even clear thinkers
like Polya and Poincare' treat mathematical ability as a sacred,
almost mystic quality, tied to the unconscious. The writings of
philosophers and psychologists invariably attempt to examine human
performance and belief, which are far more managable than creativity
⊗4in vitro⊗*. Belief formulae in inductive logic$$ For example, see
[Hintikka 62], [Pietarinin 72].
The latter also contains a good summary of Carnap's λ,αα formalization. $
invariably fall back upon how well they fit
human measurements. The abilities of a computer and a brain are too
distinct to consider blindly working for results (let alone
algorithms!) one possesses which match those of the other.
.SSEC(Capabilities and Limitations of AM)
.if false then start;
.ONCE TURN ON "{⎇";
The first two subsections contain a general discussion of what AM can and
can't do. Later subsections deal with powers and limitations inherent in using
an agenda scheme,
in fixing the domain of AM, and in picking one specific
model of math research to build AM upon.
The AM
program exists only because a great many simplifying assumptions were tolerated;
these
are discussed
in Section {SECNUM⎇.{SSECNUM⎇.{[3]ASSUMPS⎇ (p. {[3]LIMASS⎇).
Finally, some speculation is made about the ultimate
powers and weaknesses of any systems which are designed very much like AM.
.end;
. SSSEC(Current Abilities)
⊗4What fields has AM worked in so far?⊗* AM is now able to explore a
small bit of the theory of sets, data types, numbers, and plane
geometry. It by no means has been fed -- nor has it rediscovered --
a large fraction of what is known in any of those fields. It might be
more accurate to be humble and restate those domains as: elementary
finite set theory, trivial observations about four kinds of data
types, arithmetic and elementary divisibility theory, and simple
relationships between lines, angles, and triangles. So a
sophisticated concept in each domain -- which was discovered by AM --
might be:
.BEGIN INDENT 4,7,0; PREFACE 0;
⊗8#⊗*
de Morgan's laws
⊗8#⊗*
the fact that Delete-o-Insert$$
Take an item x, insert it into (the front of) structure B, then delete one
(the first) occurrence of x from B. $ never alters Bags or
Lists
⊗8#⊗*
unique factorization
⊗8#⊗*
similar triangles
.E
⊗4Can AM work in a new field, like politics?⊗* AM can work in a new
elementary, formalized domain, if it is fed a supplemental base of
conceptual primitives for that domain. To work in plane geometry, it
sufficed to give AM about twenty new primitive concepts, each with a
few parts filled in. Other domains which AM could work in include
elementary mechanics, game-playing, and programming.
The more informal (non-internally-formalizable) the desired field, the less of AM that
is relevant. Perhaps an AM-like system could be built for a constrained, precise
political task.$$ For example, such a politics-oriented AM-like system
might conceive the notion of a group of political entities which
view themselves as quite disparate, but which are viewed from the outside
as a single unit: e.g., `the Arabs', `the American Indians'.
Conjectures about this concept might include its reputation as a poor
combatant (and why). Many of the same facets AM uses would carry over to represent
concepts in that new domain. $
Disclaimer: Even for a very small domain, the amount of
common-sense knowledge such a system would need is staggering.
It is unfortunate to provide such a trivial answer to such an important question,
but there is no easy way to answer it more fully until years of additional research
are performed.
⊗4Can AM discover X? Why didn't it do Y?⊗* It is difficult to predict
whether AM will (without modifications) ever make a specific given
discovery. Although its capabilities are small, its limitations are
hazy. What makes the matter even worse is that, given a concept C
which AM missed discovering, there is probably a reasonable heuristic
rule which is missing from AM, which would enable that discovery. One
danger of this "debugging" is that a rule will be added which only
leads to that one desired discovery, and isn't good for anything
else.
In that case, the new heuristic rule would simply be an ⊗4encoding⊗*
of a specific bit of mathematics which AM would then appear to
discover using general methods.
This must be avoided at all costs, ⊗4even at the cost of
intentionally giving up a certain discovery.⊗* If the needed rule is
general -- it has many applications and leads to many interesting
results -- then it really was an oversight not to include it in AM.
Although I believe that there are not too many such omissions still
within the small realm AM explores, there is no objective way to
demonstrate that, except by further long tests with AM.
.ONCE TURN ON "{⎇"
⊗4In what ways are new concepts created?⊗* Although the answer to
this is accurately given in Section {[2]HEURS⎇.{[2]CREATECON⎇,
page {[3]CREATECONP⎇
(namely, this is mainly the jurisdiction of the right sides of
heuristic rules), and although I dislike the simple-minded way it
makes AM sound, the list below does characterize the major ways in
which new concepts get born:
.BN SELECT 6
Fill in examples of a concept (e.g., by instantiating or running its
definition)
Create a generalization of a given concept (e.g., by weakening its
definition)
Create a specialization of a given concept (e.g., by restricting its
domain/range)
Compose two operations f,g, thereby creating a new one h.
[Define h(x)≡f(g(x))]
Coalesce an operation f into a new one g. [Define g(x)≡f(x,x)]
Permute the order of the arguments of an operation. [Define g(x,y)≡f(y,x)]
Invert an operation [g(x)=y iff f(y)=x] (e.g., from Squaring, create
Square-rooting)
Canonize one predicate P1 with respect to a more general one P2
[create a new concept f, an operation, such that: P2(x,y) iff
P1(f(x),f(y))]
Create a new operation g, which is the repeated application of an
existing operation f.
The usual logical combinations of existing concepts x,y: ⊗6x∧y, x∨y,
¬x,⊗* etc.
.E
Below is a similar list, giving the primary ways in which AM
formulates new conjectures:
.BN SELECT 6
Notice that concept C1 is really an example of concept C2
Notice that concept C1 is really a specialization (or:
generalization) of C2
Notice that C1 is equal to C2; or: ⊗4almost always⊗* equal
Notice that C1 and C2 are related by some known concept
Check and update the domain/range of an existing operation
If two concepts are analogous, extend the analogy to their conjectures as well
.E SELECT 1
.if false then start;
In summary, we can say that AM has achieved its original purpose: to be
guided successfully by a large set of local heuristic rules, in the
discovery of new mathematical theories.
Besides creating new concepts and noticing conjectures, AM has the
key "ability" of appearing to decide rationally what to work on
at each moment. This is a result of the agenda of tasks --
containing associated reasons. Of course all of these abilities stem
from the quality and the quantity of local heuristic rules: little
plausible move generators and evaluators.
.end;
. SSSEC(Current Limitations)
Below are several shortcomings of AM, which hurt its behavior but are
not believed to be
inherent limitations of its design. They are presented in order
of decreasing severity.
Perhaps the most serious limitation on AM's current behavior arose
from the lack of constraints on left sides of heuristic rules. It
turned out that this excessive freedom made it difficult for AM to
inspect and analyze and synthesize its own heuristics; such a need
was not foreseen at the time AM was designed. It was thought that the
power to manipulate heuristic rules was an ability which the author
must have, but which the system wouldn't require. As it turned out,
AM did successfully develop new concepts several levels deeper than
the ones it started with. But as the new concepts got further and
further away from those initial ones, they had fewer and fewer
specific heuristics filled in (since they had to be filled in by AM
itself). Gradually, AM found itself relying on heuristics which were
very general compared to the concepts it was dealing with (e.g.,
forced to use heuristics about Objects when dealing with Numbers).
Heuristics for dealing with heuristics do exist, and their number
could be increased. This is not an easy job: finding a new
meta-heuristic is a tough process. Heuristics are rarely more than
compiled hindsight; hence it's difficult to create new ones "before the fact".
Our current research is based on the hypothesis that heuristics
can be discovered and reasoned about in much the same way that
mathematical operations can. What this means is recoding each
of AM's heuristics as a full-fledged concept. This would
enable the heuristics to work on each other, as well as on
concepts corresponding to mathematical entities.
.ONCE TURN ON "{⎇"
AM has no notion of proof, proof techniques, formal validity,
heuristics for finding counterexamples, etc. Thus it never really
establishes any conjecture formally. This could probably be remedied
by adding about 25 new concepts (and their 100 new associated
heuristics) dealing with such topics. The needed concepts have been
outlined on paper, but not yet coded.
It would
probably require a few hundred hours to code and debug them.
[Lakatos 76] demonstrates the value of proof as an instrument of
discovery: As one spirals in from conjecture to theorem, along the
path of repeated criticism and improvement, one often finds
motivation for modifying concepts' definitions, or for defining
new, promising ones.
The user interface is quite primitive, and this again could be
dramatically improved with just a couple hundred hours' work. AM's
explanation system is almost nonexistent: the user must ask a
question quickly, or AM will have already destroyed the information
needed to construct an answer. A clean record of recent system
history and a nice scheme for tracking down reasons
for modifying old rules and adding new ones dynamically
does not exist at
the level which is found, e.g., in MYCIN [Davis 76]. There is no trivial way to
have the system print out its heuristics in a format which is
intelligible to the untrained user.
An important type of analogy which was untapped by AM was that
between heuristics. If two situations were similar, conceivably the
heuristics useful in one situation might be useful (or have useful
analogues) in the new situation
(see [Koppelman 75]). Perhaps this is a viable way of
enlarging the known heuristics. Such "meta-level" activities were
kept to a minimum throughout AM, and this proved to be a serious
limitation.
.if false then start;
My intuition tells me that the "right" ten meta-rules
could correct this particular deficiency.
.end;
The idea of "Intuitions" facets was a flop.
Intuitions were meant to model reality, at least little pieces of it,
so that AM could perform (simulate) physical experiments, and observe
the results. The major problem here was that ⊗4so⊗* little of the world was
modelled that the only relationships derivable were those foreseen by the
author. This lack of generality was unacceptable, and the intuitions were
completely excised. The original idea might lead somewhere if it were
developed fully.
.if false then start;
As with all limitations of AM, I leave this as an open suggestion
for future research.
.end;
.if false then start;
Several limitations arose from the constraints of the agenda scheme,
from the choice of finite set theory as the domain to work in, and
from the particular model of math research that was postulated. These
will be discussed in the next few
subsections.
.end;
.if false then start; SSSEC(Limitations of the Agenda scheme)
.COMMENT Should these DIFlabels go in the sec's header?;
.DIFSECNUM: SECNUM;
.DSEC: SECNUM;
.COMMENT Not sure about that last label-- it might not be meant for
.here;
.DIFSSECNUM: SSECNUM;
.DIFSSEC: SSECNUM;
.DIFSEC: PAGE;
The following quibbles with the agenda scheme get less and less
important:
Currently, it is difficult to include heuristics which interact with
one another in any significant way. The whole fibre of the Agenda
scheme assumes perfect independence of heuristics. The global formula
used to rate tasks on the agenda assumes perfect superposition of
reasons: there are no "cross-terms". Is this assumption always
valid? Unfortunately ⊗4no⊗*, not even for the limited domain AM has
explored. Sometimes, two reasons are very similar: "Examples of Sets
would permit finding examples of Union" and "Examples of Sets would
permit finding examples of Intersection". In that case, their two
ratings shouldn't cause such a big increase in the overall priority
value of the task "⊗6Fillin examples of Sets⊗*".
Sometimes, a heuristic rule will want to ⊗4dissuade⊗* the system from
some activity. Thus a ⊗4negative⊗* numeric contribution to a task's
priority value is desired. This is not figured into the current
scheme. With a slight modification, the global formula could preserve
the sign (signum) of each reason's rating.
Tasks on the agenda list are ordered by their numeric priority value.
Each reason's numeric value is kept, too. When new reasons are
added, these values are used to recompute a new priority for the
task. Each reason's rating was computed by a little formula found
inside some heuristic rule. Those formulae are not kept hanging
around. One big improvement in apparent intelligence could be
attained by tacking on those little formulae to the reasons. When a
new reason is added, the old reasons' rating formulae would be
evaluated again. They might indeed give new numbers. For example,
suppose one reason was "Few examples of X are known". But by now,
other tasks have meanwhile inadvertantly filled in several examples
of X. Then that little reason's formula would come up with a much
lower value than it did originally. In fact, the value might be so
low that the reason was dropped altogether. If the formulae were
kept, it might be good practice to evaluate them for the top two or
three tasks on the agenda, to see if they might change their
ordering. Also, the top task's priority would then be more accurate,
and recall that its value is used to determine the cpu time and list
cell space quanta that the task is allowed to use up. At the moment,
AM is not set up to store the little functions, and if modified to do
so, it uses up a lot more space than it can afford. Also, the top
few jobs are almost never semantically coupled (except by "focus of
attention"), so the precise order in which they are executed rarely
matters.
Perhaps what is needed is not a single priority value for each task,
but a vector of numbers. At each cycle, AM would construct a vector
of its current "interests" and needs, and each task's vector would be
dot-multiplied against this global vector of AM's desires. The
highest scorer would then be chosen. For example, one dimension of
the rating could be "safety", and one could be "best possible
payoff", one could be "average expected payoff", etc. Sometimes, AM
would have to break out of a stagnant situation, and it would be
willing to try riskier tasks than usual. This was not implemented
because of the great increase in cpu time it would cause. It is,
however, probably a better design than the current one. Even more
intelligent schemes can be envisioned -- involving more and more
symbolic data being stored with each task. Ultimately, this would be
just the English reasons themselves; by that time, the task-orderer
would have grown into an incredibly complex AI program itself (a
natural language program plus an interrelator plus...).
The agenda list should really be an agenda ⊗4tree⊗*$$
maybe an agenda ↓_Heap_↓.
$, since the
ordering of tasks is really just partial, not total. If this is
clear, then skip the rest of this paragraph. There are some
"legitimate" orderings of tasks on the agenda; if task X is supported
by a subset of the reasons which support Y, then typically the priority
of X will be less than or equal to the priority of Y. Two tasks
of the form "Fillin examples of A", "Fill in examples of B" can be
ordered simply because A is currently much more interesting than B.
But often, two tasks will have no ironclad ordering between them:
compare "Fillin examples of Sets" and "Check generalizations of
Union". Thus the ordering is only partial, and it is the artifice of
the global evaluation function which embeds this into a linear
ordering. If multiprocessors are used, it might be advantageous to
keep the original partial ordering around.
.end;
Currently, it is difficult to include heuristics which interact with
one another in any significant way. The whole fibre of the Agenda
scheme assumes perfect independence of heuristics. The global formula
used to rate tasks on the agenda assumes perfect superposition of
reasons: there are no "cross-terms". Is this assumption always
valid? Unfortunately ⊗4no⊗*, not even for the limited domain AM has
explored. Sometimes, two reasons are very similar: "Examples of Sets
would permit finding examples of Union" and "Examples of Sets would
permit finding examples of Intersection". In that case, their two
ratings shouldn't cause such a big increase in the overall priority
value of the task "⊗6Fillin examples of Sets⊗*".
There are a few other minor limitations of the current Agenda scheme:
(i) Sometimes, a heuristic rule will want to ⊗4dissuade⊗* the system from
some activity; so occasionally we'd like a ⊗4negative⊗* numeric contribution to a task's
priority value.
(ii) AM discards the formula which produced a reason's rating, and
keeps around only the numeric result of that formula; but many of
those formulae involve time-dependent quantities (e.g., the number
of entries on a particular facet of a particular concept).
(iii) Better than a single priority value for each task would be
for AM to maintain a vector of numbers.
(iv) The agenda list should really be an agenda ⊗4tree⊗*
(maybe an agenda ↓_Heap_↓),
since the
ordering of tasks is really just partial, not total.
The utility of this is extremely high if multiprocessor
technology is available to run the program on.
. SSSEC(Limiting Assumptions)
.LIMASS: PAGE;
.ASSUMPS: SSSECNUM;
AM only "got off the ground" because a number of sweeping assumptions
were made, pertaining to what could be ignored, how a complex process
could be adequately simulated, etc. Now that AM ⊗4is⊗* running,
however, those same simplifications crop up as limitations to the
system's behavior. Each of the following points is a `convenient
falsehood'. Although the reader has already been told about some of
these, it's worth listing them all together here:
.BN; PREFACE 1; AT ",," ⊂ "⊗8# ⊗*" ⊃;
,, The only communication necessary from AM to the user is keeping
the user informed of what AM is doing. No natural language ability is
required by AM; simple template instantiation is sufficient.
,, The only communication from the user to AM is an occasional
interrupt, when the user wishes to provide some guidance or to pose a
query. Both of these can be stereotyped and passed easily through a
very narrow channel.$$ E.g., a set of escape characters, so α↑W means
`⊗4Why did you do that?⊗*', α↑U means `⊗4Uninteresting! Go on to
something else⊗*', etc. $
,, Each heuristic has a well-defined domain of applicability, which
can be specified just by giving the name of a single concept.
,, If concept C1 is more specialized than C2, then C1's heuristics
will be more powerful and should be executed before C2's (whenever
both concepts' heuristics are relevant).
,, If h1 and h2 are two heuristics attached to concept C, then it is
not necessary to spend any time ordering them.
,, Heuristics superimpose perfectly; they never interact strongly
with each other.
,, The reasons supporting a task can be mere tokens; it suffices to
be able to inspect them for equality. They need not follow a constrained
syntax. The value of a reason is adequately
characterized by a unidimensional
numeric rating.
,, The reasons supporting a task superimpose perfectly; they never interact
with each other.
,, Supporting reasons -- and their ratings -- never change
with time, with one exception: the ephemeron `Focus of attention'.
,, It doesn't matter in what order the supporting reasons for a task were added.
,, There is no need for negative or inhibitory reasons, which would decrease the
priority value of a task.
,, At any moment, the top few tasks on the agenda are not coupled
strongly; it is not necessary to expend extra processing time to
carefully order them.
,, The tasks on the agenda are completely independent of each other,
in the sense of one task `enabling' or `waking-up' another.
.ONCE TURN ON "{⎇";
,, Mathematics research has a clean, simple model (see Section
{[2]MODELSEC⎇.{[2]MODELSSEC⎇.{[2]MODELSSSEC⎇, page {[3]MODELPAGE⎇),
which indicates that it is a search process governed by a large
collection of heuristic rules.
,, Elementary mathematics is such that valuable new concepts will be
discovered fairly regularly.
,, The worth of each new concept can be
estimated easily, after just a brief investigation.
,, Contradictions
will arise very rarely,
and it is not disastrous to ignore them when they do occur.
The same indifference applies to the danger of believing in false conjectures.
,, When doing theory formation in elementary mathematics,
proof and formal reasoning are dispensable.
,, Even as more knowledge is obtained, the set of facets need never
change.
,, For any piece of knowledge sought or obtained, there is precisely
one facet of one existing$$ The only allowable exception is that a
new piece of information might require the creation of a brand new
concept, and then require storage somewhere on that concept. $
concept where that knowledge ought to be stored, and it is easy to
determine that proper location.
,, Even as more concepts are defined, the body of heuristics need not
grow much.
,, Any common-sense knowledge required by AM is automatically present
within the heuristic rules. So, e.g., no special spatial
visualization abilities are needed.
.E
It is worth repeating here that the above assumptions are all clearly
⊗4false⊗*. Yet none of them was too damaging to AM's behavior, and their combined
presence made the creation of AM feasible.
. SSSEC(Choice of Domain)
.QQ
The genesis of mathematical creation is a problem which should intensely
interest the psychologist. It is the activity in which the
human mind seems to take least from the outside world,
in which it acts or seems to act only of itself and on itself, so that
in studying the procedure of mathematical thought we may hope to
reach what is most essential in man's mind.
-- Poincare'
.ESS
Here are some questions this subsection will address:
.BN
⊗8# ⊗* What are the inherent limitations -- and advantages -- in
fixing a domain for AM to work in?
⊗8# ⊗* What characteristics are favorable to automating research in
any given domain?
⊗8# ⊗* What are the specific reasons for and against elementary
finite set theory as the chosen starting domain?
.E
Research in various domains of science and math proceeds slightly
differently. For example, psychology is interested in explaining
people, not in creating new kinds of people. Math is not interested
in individual entities so much as in new kinds of entities. There are
ethical restrictions on physicians which prevent certain experiments
from being done. Political experiments rarely permit backtracking,
etc. Each field has its own peculiarities.
If we want a system to work in many domains, we have to sacrifice
some power.$$ This is assuming a system of a given fixed size. If
this restriction isn't present, then a reasonable "general-purpose"
system could be built as several systems linked by one giant
switch.$. Within a given field of knowledge (like math), the finer the
category we limit ourselves to, the more specific are the heuristics
which become available. So it was reasonable to make this first
attempt limited to one narrow domain.
This brings up the choice of domain. What should it be? As the
DENDRAL project illustrated so clearly$$ see [Feigenbaum et. al. 71].
In that case, the choice of subject was enabled by [Lederberg 64]. $,
choice of subject domain is quite important when studying how
researchers discover and develop their theories. Mathematics was
chosen as the domain of this investigation, because
.BN
λλ In doing math research, one needn't cope with the uncertainties
and fallability of testing equipment; that is, there are no
uncertainties in the data (compared to, e.g., molecular structure inference
from mass spectrograms).
λλ Reliance on experts' introspections is one of the most powerful
techniques for codifying the judgmental criteria necessary to do
effective work in a field; I personally have had enough training in
elementary mathematics so that I didn't have to rely completely on
external sources for guidance in formulating such heuristic rules.
Also, several excellent sources were available [Polya, Skemp,
Hadamard, Kershner, etc.].
λλ The more formal a science is, the easier it is to automate. For a machine to
carry out research in psychology would require more knowledge about
human information processing than now is known, because psychology
deals with entities as complex as you and I. Also, in a formal science,
the ⊗4languages⊗* to communicate information can be simple even
though the ⊗4messages⊗* themselves be sophisticated.
λλ Since mathematics can deal with any conceivable constructs, a
researcher there is not limited to explaining observed data. Related
to this is the freedom to investigate -- or to give up on -- whatever
the researcher wants to. There is no single discovery which is the
"goal", no given problem to solve, no right or wrong behavior.
λλ Unlike "simpler" fields, such as propositional logic, there is an
abundance of heuristic rules available for the picking.
.E
The limitations of math as a domain are closely intertwined with its
advantages. Having no ties to real-world data can be viewed as a
limitation, as can having no clear goal. There is always the danger
that AM will give up on each theory as soon as the first tough
obstacle crops up.
Since math has been worked on for millenia by some of the greatest
minds from many different cultures, it is unlikely that a small
effort like AM would make any new inroads, have any startling
insights. In that respect, Dendral's space was much less explored.
Of course
math -- even at the elementary level that AM explored it -- still has
undiscovered gems (e.g., the recent unearthing of Conway's numbers [Knuth 74]).
One point of agreement between Weizenbaum and Lederberg$$
See the quote at the front of the next subsection. It is from [Lederberg 76],
a review of [Weizenbaum 76].
This review also
exists as file WEIZEN.LED[pub,jmc]α@SAIL.
$ is that AI can succeed in automating an activity only
when a "strong theory" of that activity exists. AM is built on a
detailed model of how humans do math research. In the next
subsection, we'll discuss the model of math research that AM assumes.
.if false then start;
Before that, consider for a moment how few other fields of human
endeavor have a good model, and also enjoy all the advantages listed
above: other domains of math, classical physics,... not many others.
.end;
. SSSEC(Limitations of the Model of Math Research)
.QQ
Weizenbaum does point to projects in mathematics and
chemistry where computers have shown their potential for
assisting human scientists in solving problems. He correctly
points out that these successes are based on the existence of
"strong theories" about their subject matter.
-- Lederberg
.ESS
.ONCE TURN ON "{⎇";
AM, like anything else in this world, is constrained by a mass of
assumptions. Most of these are "compiled" or interwoven into the very
fabric of AM, hence can't be tested by experiments on AM.
Some of these were just discussed a few pages ago, in Section
{SECNUM⎇.{SSECNUM⎇.{[3]ASSUMPS⎇.
Another body of assumptions exists.
AM is
built around a particular model of how mathematicians actually go
about doing their research. This model was derived from
introspection, but can be supported by quotes from Polya, Kershner,
Hadamard, Saaty, Skemp, and many others. No attempt will be made to justify
any of these premises. On the next page is a simplified summary of
that information processing model for math theory formation:
.COMMENT GROUP
. ONCE TURN OFF "α" TURN ON "∞→" ⊗8⊂∞α→⊃⊗*
. NARROW 2,2;
.COMMENT BOXTOP(1);
.SKIP TO COLUMN 1;
.GROUP SKIP 1;
.ONCE CENTER SELECT 5;
MODEL OF MATH RESEARCH
.GROUP SKIP 1;
. TOPSIDE←TOPLINE+LINE-1;
. BOTTOMSIDE←46;
. BSIDE←49;
.AREA MODL LINES TOPSIDE TO BSIDE;
.AREA BOXING LINES TOPSIDE TO BSIDE;
.PLACE BOXING
.BEGIN;
.NOFILL; PREFACE 0 MILLS
.SELECT 8
.INDENT 0
.TABS 75
.TURN ON "\∞→"
⊂∞α→\⊃
.K←BOTTOMSIDE-TOPSIDE-1
.REPEAT ⊂
}∞ →\}
.K←K-1; IF K≤0 THEN DONE
.⊃
%∞α→\$
.END
.MODELPAGE: PAGE;
.MODELSSEC: SSECNUM;
.PLACE MODL;
. FILL; BN;
. SPACING 0 MILLS; PREFACE 0 MILLS;
. GROUP SKIP 2;
. INDENT 2,6,3;
.MODELSSSEC: SSSECNUM;
.MODELSEC: SECNUM;
λλ The order in which a math textbook presents a theory is almost the
exact opposite of the order in which it was actually discovered and
developed. In a text, new definitions are stated with little or no
motivation, and they turn out to be just the ones needed to state the
next big theorem, whose proof then magically appears. In contrast, a
mathematician doing research will examine some already-known
concepts, perhaps trying to find some regularity in experimental data
involving them. The patterns he notices are the conjectures he must
investigate further, and these relationships directly motivate him to
make new definitions.
.COMMENT APART;
λλ Each step the researcher takes while developing a new theory
involves choosing from a large set of "legal" alternatives -- that
is, searching. The key to keeping this from becoming a blind,
explosive search is the proper use of evaluation criteria. Each
mathematician uses his own personal heuristics to choose the "best"
alternative available at each moment.
λλ Non-formal criteria (aesthetic interestingness, inductive
inference from empirical evidence, analogy, and utility) are much
more important than formal deductive methods in developing
mathematically worthwhile theories, and in avoiding barren
diversions.
λλ Progress in ⊗4any⊗* field of mathematics demands much non-formal
heuristic expertise in ⊗4many⊗* different "nearby" mathematical
fields. So a broad, universal core of knowledge must be mastered
before any single theory can meaningfully be developed.
λλ It is sufficient (and pragmatically necessary) to have and use a
large set of informal heuristic rules. These rules direct the
researcher's next activities, depending on the current situation he
is in. These rules can be assumed to superimpose ideally: the
combined effect of several rules is just the sum of the individual
effects.
λλ The necessary heuristic rules are virtually the same in all
branches of mathematics, and at all levels of sophistication. Each
specialized field will have some of its own heuristics; those are
normally much more powerful than the general-purpose heuristics.
λλ For true understanding, the researcher should grasp$$
Have access
to, relate to, store, be able to manipulate, be able to answer
questions about $ each concept in several ways: declaratively,
abstractly, operationally, knowing when it is relevant, and as a
bunch of examples.
λλ Common metaphysical assumptions about nature and science: Nature
is fair, uniform, and regular. Coincidences have meaning.
Statistical considerations are valid when looking at mathematical
data. Simplicity and symmetry and synergy are the rule, not the
exception.
.COMMENT WIDEN 2,2
. ONCE SELECT 8 TURN OFF "α" TURN ON "∞→" %∞α→$;
. COMMENT BOXB;
.END;
.SKIP TO COLUMN 1
. SSSEC(Ultimate powers and weaknesses)
Consider now ⊗4any⊗* system which is consistent with the preceding
model of math research, and whose orientation is to discover and
develop new (to the system) mathematical theories. This includes AM
itself, but might also include a bright high-school senior who has
been taught a large body of heuristic rules.
What can such systems ultimately achieve? What are their ultimate
limits? Answers to ultimate questions are hard to come by
experimentally, so this discussion will be quite philosophical,
speculative, and short. The model of math research hinges around the
use of heuristic rules for guidance at all levels of behavior. It is
questionable whether or not all known mathematics could evolve
smoothly in this way. As a first order fixup, we've mentioned the
need to provide good meta-heuristics, to keep enlarging the set of
heuristics. If this is not enough (if meta-meta-...-heuristics are
needed), then the model is a poor one and has some inherent
limitations.$$ If Ptolemy had had access to a digital computer, all
his data could have been made to fit (to any desired accuracy), just
by computing epi-cycles, epi-epi-cycles,... to the needed number of
epi's. We in AI must constantly be on guard against that error. $ If
some discoveries can only be made non-rationally (by random chance, by
Gestalt, etc.)
then any such system would be incapable of finding those concepts.
Turning aside from math, what about systems whose design -- as a
computer program -- is similar to AM?$$ Having an agenda of tasks
with reasons and reason-ratings combining to form a global priority
for each task, having
units/modules/frames/Beings/Actors/scripts/concepts
which have parts/slots/aspects/facets, etc. Heuristic rules are tacked onto
relevant concepts, and are executed to produce new concepts, new
tasks, new facet entries. $ Building such systems will be "fun", and
perhaps will result in new discoveries in other fields. Eventually,
scientists (at least in a few very hard domains) may relegate more
and more of their "hack" research duties to AM-like systems. The
ultimate limitations will be those arising from incorrect (e.g.,
partial) models of the activities the system must perform. The
systems themselves may help improve these models: experiments that
are performed on the systems are actually tests of the underlying
model; the results might cause revisions to be made in the model,
then in the system, and the whole cycle would begin again.
.SSEC(Final Conclusions)
Before quitting, let's summarize what's worth remembering about this
research project.
.BEGIN INDENT 0,4,0; TURN ON "{⎇"
⊗8## ⊗* It is a demonstration that a few hundred general heuristic
rules suffice to guide an automated math researcher as it explores
and expands a large but incomplete knowledge base of math concepts.
AM serves as a living existence proof that creative research can be
effectively modelled as heuristic search.
⊗8## ⊗* The thesis also introduces a control structure based upon an
agenda of small research tasks, each with a list of supporting
reasons attached.
⊗8## ⊗* The main limitation of AM was its inability to synthesize
powerful new heuristics for the new concepts it defined.
⊗8## ⊗* The main successes were the few novel ideas it came up with,
the ease with which a new task domain was fed to the system, and --
most importantly -- the overall rational sequences of behavior AM
exhibited.
⊗8## ⊗* The greatest long-range importance of AM may well lie in the
body of heuristics assembled, either as the
seed for a huge base of experts' heuristics, or as a new orientation
for mathematics education.
.E
.SKIP TO COLUMN 1;
.ENDTEXT: PAGE;